Goto

Collaborating Authors

 video shot


Convex Hull Prediction for Adaptive Video Streaming by Recurrent Learning

arXiv.org Artificial Intelligence

Adaptive video streaming relies on the construction of efficient bitrate ladders to deliver the best possible visual quality to viewers under bandwidth constraints. The traditional method of content dependent bitrate ladder selection requires a video shot to be pre-encoded with multiple encoding parameters to find the optimal operating points given by the convex hull of the resulting rate-quality curves. However, this pre-encoding step is equivalent to an exhaustive search process over the space of possible encoding parameters, which causes significant overhead in terms of both computation and time expenditure. To reduce this overhead, we propose a deep learning based method of content aware convex hull prediction. We employ a recurrent convolutional network (RCN) to implicitly analyze the spatiotemporal complexity of video shots in order to predict their convex hulls. A two-step transfer learning scheme is adopted to train our proposed RCN-Hull model, which ensures sufficient content diversity to analyze scene complexity, while also making it possible to capture the scene statistics of pristine source videos. Our experimental results reveal that our proposed model yields better approximations of the optimal convex hulls, and offers competitive time savings as compared to existing approaches. On average, the pre-encoding time was reduced by 53.8% by our method, while the average Bjontegaard delta bitrate (BD-rate) of the predicted convex hulls against ground truth was 0.26%, and the mean absolute deviation of the BD-rate distribution was 0.57%.


Solution for Point Tracking Task of ICCV 1st Perception Test Challenge 2023

arXiv.org Artificial Intelligence

This report proposes an improved method for the Tracking Any Point (TAP) task, which tracks any physical surface through a video. Several existing approaches have explored the TAP by considering the temporal relationships to obtain smooth point motion trajectories, however, they still suffer from the cumulative error caused by temporal prediction. To address this issue, we propose a simple yet effective approach called TAP with confident static points (TAPIR+), which focuses on rectifying the tracking of the static point in the videos shot by a static camera. To clarify, our approach contains two key components: (1) Multi-granularity Camera Motion Detection, which could identify the video sequence by the static camera shot. (2) CMR-based point trajectory prediction with one moving object segmentation approach to isolate the static point from the moving object. Our approach ranked first in the final test with a score of 0.46.


Deep-Learning-Based Computer Vision Approach For The Segmentation Of Ball Deliveries And Tracking In Cricket

arXiv.org Artificial Intelligence

There has been a significant increase in the adoption of technology in cricket recently. This trend has created the problem of duplicate work being done in similar computer vision-based research works. Our research tries to solve one of these problems by segmenting ball deliveries in a cricket broadcast using deep learning models, MobileNet and YOLO, thus enabling researchers to use our work as a dataset for their research. The output from our research can be used by cricket coaches and players to analyze ball deliveries which are played during the match. This paper presents an approach to segment and extract video shots in which only the ball is being delivered. The video shots are a series of continuous frames that make up the whole scene of the video. Object detection models are applied to reach a high level of accuracy in terms of correctly extracting video shots. The proof of concept for building large datasets of video shots for ball deliveries is proposed which paves the way for further processing on those shots for the extraction of semantics. Ball tracking in these video shots is also done using a separate RetinaNet model as a sample of the usefulness of the proposed dataset. The position on the cricket pitch where the ball lands is also extracted by tracking the ball along the y-axis. The video shot is then classified as a full-pitched, good-length or short-pitched delivery.


Google's URL2Video can turn websites into videos using AI - techAU

#artificialintelligence

Google has some sweet new artificial intelligence technology that can take elements of a website and convert them into a really slick video. In this multi-channel world we live in, brands spend an awful amount of time and money reformatting content for different platforms. A new project from Google Research, recently published on the Google AI Blog, is called URL2Video. This automatically converts a web page into a short video and the great thing is, it's capable of formatting that video in different aspect ratios, suiting both vertical and horizontal orientations. The tool interrogates the website code and walks the DOM lokoing for multimedia elements, headings, images, video etc that it can leverage to create the content.


AI could turn your blurry phone videos into slow-mo masterpieces

#artificialintelligence

Nvidia wants to help you make awesome slow-mo videos. Nvidia wants to help you turn any old video shot on your phone into a blur-free, slow-motion masterpiece, and it's using artificial intelligence to do it. Researchers at the company have developed a new deep-learning system that can convert standard video into slow-mo by adding additional frames after the video has been shot. The result would turn a video shot at 30 frames per second (standard for a phone shooting a regular video) into something that appears as a 240 fps video. To create the slow-mo AI, researchers used 11,000 videos of sport and everyday activities shot at 240 fps to train a neural network, which learned to predict the extra frames.


This Startup Wants to Help Brands Make Videos Using Artificial Intelligence

#artificialintelligence

A new startup says it can make it easier for brands to ramp up the volume and quality of their content on social media, with the help of artificial intelligence. Justin Fuisz, who founded the interactive video ad startup Fuisz Video in 2013, is rolling out Octi, a video technology company designed to help marketers pull together video shot by teams on the ground at live events and use that content to populate their various social media feeds. Marketers these days have an ever-pressing need to produce more and more content for their social media channels. But feeding that beast often requires paying multiple agencies to crank out post after post--and the quality of such content can vary widely. Instead, Octi allows marketers to shoot videos using multiple cameras at the same time and then uses artificial intelligence to automatically produce a seamlessly edited single clip to be distributed on social media, according to Mr. Fuisz.


Reading the Videos: Temporal Labeling for Crowdsourced Time-Sync Videos Based on Semantic Embedding

AAAI Conferences

Recent years have witnessed the boom of online sharing media contents, which raise significant challenges in effective management and retrieval. Though a large amount of efforts have been made, precise retrieval on video shots with certain topics has been largely ignored. At the same time, due to the popularity of novel time-sync comments, or so-called "bullet-screen comments", video semantics could be now combined with timestamps to support further research on temporal video labeling. In this paper, we propose a novel video understanding framework to assign temporal labels on highlighted video shots. To be specific, due to the informal expression of bullet-screen comments, we first propose a temporal deep structured semantic model (T-DSSM) to represent comments into semantic vectors by taking advantage of their temporal correlation. Then, video highlights are recognized and labeled via semantic vectors in a supervised way. Extensive experiments on a real-world dataset prove that our framework could effectively label video highlights with a significant margin compared with baselines, which clearly validates the potential of our framework on video understanding, as well as bullet-screen comments interpretation.


Sparse Transfer Learning for Interactive Video Search Reranking

arXiv.org Machine Learning

Visual reranking is effective to improve the performance of the text-based video search. However, existing reranking algorithms can only achieve limited improvement because of the well-known semantic gap between low level visual features and high level semantic concepts. In this paper, we adopt interactive video search reranking to bridge the semantic gap by introducing user's labeling effort. We propose a novel dimension reduction tool, termed sparse transfer learning (STL), to effectively and efficiently encode user's labeling information. STL is particularly designed for interactive video search reranking. Technically, it a) considers the pair-wise discriminative information to maximally separate labeled query relevant samples from labeled query irrelevant ones, b) achieves a sparse representation for the subspace to encodes user's intention by applying the elastic net penalty, and c) propagates user's labeling information from labeled samples to unlabeled samples by using the data distribution knowledge. We conducted extensive experiments on the TRECVID 2005, 2006 and 2007 benchmark datasets and compared STL with popular dimension reduction algorithms. We report superior performance by using the proposed STL based interactive video search reranking.